28 research outputs found
Using Ontologies for the Design of Data Warehouses
Obtaining an implementation of a data warehouse is a complex task that forces
designers to acquire wide knowledge of the domain, thus requiring a high level
of expertise and becoming it a prone-to-fail task. Based on our experience, we
have detected a set of situations we have faced up with in real-world projects
in which we believe that the use of ontologies will improve several aspects of
the design of data warehouses. The aim of this article is to describe several
shortcomings of current data warehouse design approaches and discuss the
benefit of using ontologies to overcome them. This work is a starting point for
discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure
Modelling ETL processes of data warehouses with UML activity diagrams
Extraction-transformation-loading (ETL) processes play an important role in a data warehouse (DW) architecture because they are responsible of integrating data from heterogeneous data sources into the DW repository. Importantly, most of the budget of a DW project is spent on designing these processes since they are not taken into account in the early phases of the project but once the repository is deployed. In order to overcome this situation, we propose using the unified modelling language (UML) to conceptually model the sequence of activities involved in ETL processes from the beginning of the project by using activity diagrams (ADs). Our approach provides designers with easy-to-use modelling elements to capture the dynamic aspects of ETL processes.Extraction-transformation-loading (ETL) processes play an important role in a data warehouse (DW) architecture because they are responsible of integrating data from heterogeneous data sources into the DW repository. Importantly, most of the budget of a DW project is spent on designing these processes since they are not taken into account in the early phases of the project but once the repository is deployed. In order to overcome this situation, we propose using the unified modelling language (UML) to conceptually model the sequence of activities involved in ETL processes from the beginning of the project by using activity diagrams (ADs). Our approach provides designers with easy-to-use modelling elements to capture the dynamic aspects of ETL processes
Definición y validación de medidas para procesos ETL en almacenes de datos
In data warehousing, ETL (Extract, Transform, and Load) processes are in charge of extracting the data from data sources that will be contained in the data warehouse. Due to their relevance, the quality of these processes should be formally assessed from early stages of development, in order to avoid making bad decisions as a result of incorrect data. In this paper, a set of measures is presented to evalu- ate the structural complexity of ETL process models at conceptual level. Moreover, this study is accompanied by one controlled experiment whose aim is the empirical validation of the proposed measures. The use of these measures can aid designers to predict the e®ort associated with the maintenance tasks of ETL processes. This pro- posal is based on UML (Uni¯ed Modeling Language) activity diagrams for modeling ETL processes, and on the FMESP (Framework for the Modeling and Evaluation of Software Processes) framework for the validation of the measures.In data warehousing, ETL (Extract, Transform, and Load) processes are in charge of extracting the data from data sources that will be contained in the data warehouse. Due to their relevance, the quality of these processes should be formally assessed from early stages of development, in order to avoid making bad decisions as a result of incorrect data. In this paper, a set of measures is presented to evalu- ate the structural complexity of ETL process models at conceptual level. Moreover, this study is accompanied by one controlled experiment whose aim is the empirical validation of the proposed measures. The use of these measures can aid designers to predict the e®ort associated with the maintenance tasks of ETL processes. This pro- posal is based on UML (Uni¯ed Modeling Language) activity diagrams for modeling ETL processes, and on the FMESP (Framework for the Modeling and Evaluation of Software Processes) framework for the validation of the measures
MEETING INTERNACIONAL DE EDIMBURGO [11 - 13 marzo 2022]
ANALISIS DE LA COMPETICIÓN
Informes Individuales [formato doble]
MEETING INTERNACIONAL DE EDIMBURGO [11 - 13 marzo 2022]Real Federación Española de Natació
Towards readable layouts for modeling data warehouses
Data warehouses are large-scale databases that are usually managed by means of diagram-based conceptual models. However, the complexity of those models often imposes significant design challenges. In particular, this article studies their different underlying graph layouts. The working hypothesis is that graph layouts influence diagram readability, with the latter being significant for facilitating the design process. We define the main viewpoints involved in conceptual modeling. For each one, surveyed as well as alternative layouts were evaluated against a set of aesthetics and efficiency measures. As a result, more readable graph layouts than those found in the literature were identified
Towards a model-driven engineering approach of data mining
Nowadays, data mining is based on low-level specications of the employed techniques typically bounded to a specic analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Here, we propose a model-driven approach based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (via data-warehousing technology) and the analysis models for data mining (tailored to a specic platform). Thus, analysts can concentrate on the analysis problem via conceptual data-mining models instead of low-level programming tasks related to the underlying-platform technical details. These tasks are now entrusted to the model-transformations scaffolding.This work has been supported by the ESPIA (TIN2007-67078) project (Spanish Ministry of Education), and by the QUASIMODO (PAC08-0157-0668) project (Castilla-La Mancha Ministry of Education). Jesús Pardillo and Jose-Norberto Mazón are funded by the Spanish Ministry of Education (FPU grants AP2006-00332 and AP2005-1360)
Specifying aggregation functions in multidimensional models with OCL
Multidimensional models are at the core of data warehouse systems, since they allow decision makers to early define the relevant information and queries that are required to satisfy their information needs. The use of aggregation functions is a cornerstone in the definition of these multidimensional queries. However, current proposals for multidimensional modeling lack the mechanisms to define aggregation functions at the conceptual level: multidimensional queries can only be defined once the rest of the system has already been implemented, which requires much effort and expertise. In this sense, the goal of this paper is to extend the Object Constraint Language (OCL) with a predefined set of aggregation functions. Our extension facilitates the definition of platform-independent queries as part of the specification of the conceptual multidimensional model of the data warehouse. These queries are automatically implemented with the rest of the data warehouse during the code-generation phase. The OCL extensions proposed in this paper have been validated by using the USE tool.Work supported by the projects: TIN2008-00444, ESPIA (TIN2007-67078) from the Spanish Ministry of Education and Science (MEC), QUASIMODO (PAC08-0157-0668) from the Castilla-La Mancha Ministry of Education and Science (Spain), and DEMETER (GVPRE/2008/063) from the Valencia Government (Spain). Jesús Pardillo is funded by MEC under FPU grant AP2006-00332
Towards the conceptual specification of statistical functions with OCL
Current proposals for designing information systems lack the mechanisms to define statistical functions at the conceptual level. Therefore, queries containing these kind of functions are defined once the rest of the system has already been implemented, which requires much effort and expertise. In this sense, the goal of this paper is to show the benefits of extending the Object Constraint Language (OCL) with a predefined set of statistical functions.Work supported by the projects: TIN2008-00444, ESPIA (TIN2007-67078) from the Spanish Ministry of Education and Science (MEC), QUASIMODO (PAC08-0157-0668) from the Castilla-La Mancha Ministry of Education and Science (Spain), and DEMETER (GVPRE/2008/063) from the Valencia Government (Spain). Jose-Norberto Mazón and Jesús Pardillo are funded by MEC under FPU grants AP2005-1360 and AP2006-00332, respectively. Jordi Cabot is funded by the 2007 BP-A 00128 grant (Catalan Government)